# Multimodal Task Processing
Openvla 7b Oft Finetuned Libero Spatial
MIT
OpenVLA-OFT is an optimized vision-language-action model that significantly improves the running speed and task success rate of the basic OpenVLA model through fine-tuning technology.
Multimodal Fusion
Transformers

O
moojink
2,513
3
Vitucano 2b8 V1
Apache-2.0
ViTucano is the first natively Portuguese pre-trained visual assistant, combining visual understanding and language capabilities, suitable for multimodal tasks such as image captioning and visual question answering.
Image-to-Text
Transformers Other

V
TucanoBR
86
5
Featured Recommended AI Models